SMOTE vs. KNNOR: An evaluation of oversampling techniques in machine learning
نویسندگان
چکیده
The increasing availability of big data has led to the development applications that make human life easier. In order process this correctly, it is necessary extract useful and valid information from large warehouses through a knowledge discovery in databases (KDD). Data mining an important part involves discovering developing models unknown patterns. quality used supervised machine learning algorithms plays significant role determining success predictions. One factor improves balanced dataset, where input values are distributed close each other. However, practice, many datasets unbalanced. To overcome problem, oversampling techniques generate synthetic as real possible. study, we compared performance two techniques, SMOTE KNNOR, on variety using different algorithms. Our results showed use KNNOR did not always improve accuracy model. fact, datasets, these resulted decrease accuracy. certain both were able increase indicate effectiveness varies depending specific dataset algorithm being used. Therefore, crucial assess methods case-by-case basis determine best approach for given algorithm.
منابع مشابه
Oversampling for Imbalanced Learning Based on K-Means and SMOTE
Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification a...
متن کاملGeometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE
Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In th...
متن کاملdevelopment and implementation of an optimized control strategy for induction machine in an electric vehicle
in the area of automotive engineering there is a tendency to more electrification of power train. in this work control of an induction machine for the application of electric vehicle is investigated. through the changing operating point of the machine, adapting the rotor magnetization current seems to be useful to increase the machines efficiency. in the literature there are many approaches wh...
15 صفحه اولAN EVALUATION OF MACHINE LEARNING TECHNIQUES IN INTRUSION DETECTION By
ACKNOWLEDGEMENTS I would like to thank Gabor Karsai, my advisor, for all of his help on this project. Our discussions on intrusion detection and machine learning techniques allowed me to recognize areas I had overlooked and pointed out interesting areas to explore. I would also like to thank Dr. Fisher, my second reader, for his input on the experiments and thesis background. I would like to th...
متن کاملAn Evaluation Study of Machine Learning Techniques for Identifying Spam
In this work, we investigate the use of two kinds of machine learning techniques Decision Trees and Naive Bayes applied to the problem of spam classification. We first consider building a decision tree for this purpose and then, investigate building an ensemble of decision trees using boosting. Decision trees are seen to give fairly good classification accuracy of around 92% and with the use of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Gümü?hane üniversitesi fen bilimleri enstitüsü dergisi
سال: 2023
ISSN: ['2146-538X']
DOI: https://doi.org/10.17714/gumusfenbil.1253513